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Abstract 

We consider the problem of corruption detection on networks. In this model each 
vertex of a directed graph can be either truthful or corrupt. Each vertex reports about 
the types (truthful or corrupt) of all his out-neighbors. If he is truthful, he reports 
the truth, whereas if he is corrupt he reports adversarially. This model, considered in 
[15] motivated by the desire to identify the faulty components of a digital system by 
having the other components checking them, became known as the PMC model. The 
main known results for this model characterize networks in which all corrupt (that is, 
faulty) vertices can be identified, when there is a known upper bound on their number. 
We are interested in the investigation of networks in which most of the corrupt vertices 
can be identified. We show that the main relevant parameter here is graph expansion. 
This implies that in contrast to the known results about the PMC model that imply 
that in order to identify all corrupt vertices when their number is t all indegrees have 
to be at least t, there are bounded degree graphs in which almost all corrupt and 
almost all truthful vertices can be identified, whenever there is a majority of truthful 
vertices. We also show that expansion is necessary for obtaining such a corruption 
detection and discuss algorithms and the computational hardness of the problem. 
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1 Introduction 


We study the problem of corruption detection on networks. Given a network of agents, a 
subset of whom are corrupt, our goal is to find as many corrupt and non-corrupt agents as 
possible. Neighboring vertices audit each other. We assume that truthful (non-corrupt) 
agents report the status of their neighbors accurately. We make no assumption on the 
report of corrupt agents. For example, two corrupt neighbors may collude and report 
each other as non-corrupt. Similarly, a corrupt vertex may prefer to report the status of 
some of its neighbors accurately, hoping that this will establish a truthful record for itself. 
Moreover, we assume that the corrupt agents may coordinate their actions in an arbitrary 
fashion. 

The corruption model studied here is identical to the model of diagnosable systems that 
was introduced by Perparata, Metze and Chien [15] as a model of a digital system with 
many components that can potentially fail. It is assumed that components can test some 
other components. The goal in [15] and follow up work including [7, 8, 9] and more is to 
characterize networks that can detect a certain number of corrupted nodes and find them. 
Similar models were introduced and studied in other areas of computer science, including 
Byzantine computing [10] and intrusion detection in the security community [13]. See also 
the survey [18] and Appendix A here for a nice algorithmic puzzle that resulted from this 
line of work. 

The original motivation for our work is corruption detection in social and economic 
networks, where the main objective is to understand the structure of networks that enable 
one to identify most of the corrupt nodes and most of the truthful ones. We call the 
task of identifying the types of most nodes the corruption detection problem. Examples 
of such networks may include different government agencies in a country, the network 
of banks in the EU or the network of hospitals in a geographic location. Our goal is 
to understand which network structures are more amenable to corruption and which are 
more robust against it. Social scientists have studied many aspect of corruption networks, 
see e.g. [14, 16, 6]. However, to the best of our knowledge, prior to this work there is no 
systematic study of the effect of the network structure on corruption detection. 

1.1 Formal Definitions and Main Results 

Consider a network of agents represented by a finite directed graph G = {V,E). Each 
vertex can be either truthful, or corrupt. We denote by B the set of corrupt agents and 
by T the set of truthful ones. Thus V = T U B,T d B = 0. Eor each vertex u and each of 
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its out-neighbors u, u examines v and reports about his type. If u is truthful, he reports 
the truth, that is, reports that u G T if indeed v £ T and reports that v £ B v £ B. If 
u £ B, then he reports adversarially, independently of the actual nature of v. We assume 
that the corrupt vertices can cooperate in an arbitrary fashion. The question we address 
is under what conditions on the graph G and the number of truthful vertices it is possible 
to identify almost all truthful vertices and almost all corrupt ones, with certainty. It is 
easy to see that this is impossible if |r| < \B\. Indeed, if F = Vi U V 2 U hF is a partition of 
V into 3 pairwise disjoint sets where |Fi| = IV 2 I (and W may be empty), then the corrupt 
agents can ensure that all the reports in the two scenarios T = Vi, B = V 2 U W and 
T = ¥ 2 , B = ViiJW will be identical. As there is no common truthful agent in these two 
possibilities, no deterministic algorithm can locate a truthful agent with no error. 

Our main result is that if the graph is a good bounded degree directed expander, in the 
sense described below, and we have a majority of truthful agents, it is possible to identify 
most of the truthful agents, whereas if it is far from being an expander this is impossible 
even if the number of truthful vertices is guaranteed to be at least 0.99|F|. 

We first consider the case of symmetric directed graphs, that is, graphs in which 
(u, v) is a directed edge iff {v, u) is such an edge. This case is somewhat simpler, and is 
equivalent to considering G as an undirected graph in which each vertex reports about all 
his neighbors. 

For a positive d < 1/8 call a graph G = (F, E) on a set of n vertices a S good expander 
if any set U of at most 2Sn vertices has more than \U\ neighbors outside U, and there 
is an edge between any pair of sets of vertices provided one of them is of size at least 
5n and the other is of size at least n/4. Standard results about expanders (see, e.g., [1], 
Corollary 1) imply that this holds for Ramanujan graphs or random regular graphs with 
degrees at least c/5 for an appropriately chosen absolute constant c. The main result for 
the undirected case is the following. 

Theorem 1.1 Let G = (R, E) be a 5-good expander and suppose V = TUB,Tr\B = 9 
and \T\ > \B\. Then when getting the reports of each vertex of G about all its neighbors 
we can identify a subset T' C T and a subset B' <£ B so that |T' U R'| > (1 — 5)n. That 
is, we will be able to recover the type of almost all vertices of G. 

Moreover, if |r| > (1/2 + 5)n then there is a linear time algorithm that identifies 
subsets as above from the given reports. 

We note that the algorithm in the proof of the theorem is exponential if we only assume 
that |r| > \B\ (or if we assume that |T| > (1/2 + //)n for a very small fixed /a = fr{5)). 
The fact that the detection algorithm is not efficient when we only assume that T is just a 
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little bit bigger than B is not a coincidence. Indeed, the algorithm described in the proof 
of the theorem, presented in the next section, provides a set T of more than n/2 truthful 
agents, which is consistent with the reports obtained, when such a set exists. We show 
that the problem of producing such a set when it exists is NP-haid, even when restricted 
to bounded degree expanders (and even if we ensure that there is such a set of size at least 
n/2 + rjn.) 

Theorem 1.2 For any (5 > 0 there exists anrj > 0 such that the following promise problem 
is NP-hard. The input is a graph G = {V, E) with \ V\ = n, which is a 5-good expander 
along with the status of u reported by v and viee versa for every edge e = {u,v) G E. The 
promise is that either 

• There exists a partition of V = T U B which is consistent with all of the reported 
values and |T| > n/2 + gn, or 

• All partitions V = T U B whieh are consistent with the reported values satisfy \T\ < 
n/2 — gn. 

The objective is to distinguish between the two options above. 

We also prove the following, which shows that expansion is essentially necessary for solving 
the detection problem. 

Theorem 1.3 Let G = {V,E) be a graph on n vertices so that it is possible to remove at 
most en vertices of G and get a graph in whieh any connected component is of size at most 
en. Then even knowing that V = T U B with T (1 B = 9 and \T\ > (1 — 2e)n there is no 
deterministie algorithm that identifies even a single member t G T from the reports of all 
vertiees. In particular, this is the case for planar graphs or graphs with a fixed exeluded 
minor even if e = 0(n“^/^). 

1.2 Results for directed graphs 

Next we consider directed graphs. This is motivated by the fact that in various auditing 
situations it is unnatural to allow u to inspect v whenever v inspects u. In fact, it may 
even be desirable not to allow any short cycles in the directed inspection graph. For a 
fixed 6 < 1/16, call a directed graph G = {V, E) on n vertices a J-good-directed expander 
if the following conditions hold. 

(i) For any set C/ C F of size at most 45n, |N+(C/) — U\ > \U\, where N~^(U) is the set of 
all out-neighbors of V. 
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(ii) For any two disjoint sets of vertices A and B so that |A| > 6n and \B\ > n/4 there is 
at least one directed edge from A to B and at least one directed edge from B to A. 

We first show that for any fixed positive 5 < 1/16 there are bounded degree (5-good 
directed expanders which contain no short cycles (even ignoring the orientation of edges). 

Lemma 1.4 There are two absolute positive constants ci,C 2 so that for any fixed 0 < S < 
1/16 there is a constant d < cijd and infinitely many values of n for which there is a 
5-good directed expander on n vertices in which the total degree of each vertex is d and 
there is no cycle of length smaller than C 2 logn/log(i (of any orientation). 

Theorem 1.5 Let G = {V, E) be a 5-good directed expander and suppose V = T U B, 
TnB = 0 and \T\ > \B\. Then when getting the reports of each vertex ofG about all its out- 
neighbors we can identify a subsetT' C T and a subset B' C B so that \T'[JB'\ > {l — 5)n. 
That is, we will be able to recover the type of almost all vertices of G. 

Moreover, if \T\ > {1/2 -\- 25)n then there is a linear time algorithm that identifies 
subsets as above from the given reports. If we only assume that \T\ > \B\ then the detection 
algorithm is exponential. 

1.3 Novelty and Comparison to Previons Work 

The vast literature on corruption detection in computer science, and in particular on the 
diagnosable system problem and the PMC model introduced in [15], deal either with the 
problem of identifying all corrupt nodes, or with that of identifying a single corrupt node. 
As observed in [15], a necessary condition for the identification of all corrupt nodes in 
a network with t corrupt nodes is that the minimal indegree in the network is at least 
t. Therefore, if the number of corrupt nodes is linear in the total number of vertices, all 
indegrees have to be linear, and the total number of edges has to be quadratic. 

The main contribution of the present work is a proof that the number of required edges 
may be much smaller when relaxing the requirement of identifying all corrupt nodes and 
replacing it by the requirement of the identification of most good and most corrupt nodes. 
By relaxing the requirement as above we are able to study bounded degree graphs. Our 
main new result is that a linear number of edges ensures the detection of almost all corrupt 
and almost all truthful vertices, provided the graph is a sufficiently strong expander. It 
was shown already in [15] that a linear number of edges suffices to ensure the detection of 
a single corrupt vertex. We show that such a small number of edges suffices to determine 
the types of almost all vertices, even when the number of truthful vertices exceeds that of 
corrupt ones by only 1. 
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Our results are of natural interest in many of the motivating examples for the corrup¬ 
tion detection problem; 

• In a distributed computer network of bounded (average / minimal) degree it allows 
to find a good fraction of the network that functions properly even when a positive 
fraction of the network is corrupt due to hardware problems / intrusion / viruses 
etc. 

• Similarly in auditing social networks our results allow to identify a large fraction of 
the corrupt / good nodes even in networks of bounded degree. 

Our results highlight the role of graph expansion in the context of corruption detection. 
Indeed we do not only show that graph expansion, when defined appropriately, is sufficient 
for corruption detection, but also show that it is necessary. 

1.4 Techniques 

The proofs rely crucially on the existence and properties of strong bounded degree ex¬ 
panders, see [2], [11], [1] and their references. By combining the known results with 
appropriate probabilistic arguments we establish the existence of strong bounded degree 
directed expanders with no short cycles, and use them to show that the corruption de¬ 
tection problem can be solved in such networks as well. Combining the observation that 
expansion is necessary for corruption detection with the planar separator theorem of Lip- 
ton and Tarjan and its extensions we conclude that planar graphs and graphs with a fixed 
excluded minor are not good for corruption detection. Finally we discuss the algorithmic 
aspects of our problem using results about hardness of approximation. 

2 Proofs 

2.1 Undirected graphs 

Proof of Theorem 1.1: Let H be the spanning subgraph of G in which a pair of vertices 
u and V is connected iff u reports that v gT and v reports that u G T. Let U, V 2 , • • •, U 
be the sets of vertices of the connected components of H. 

Claim 2.1 All the vertices of each V) are of the same type, that is, for each 1 < f < s, 
either Vi CT orVi <Z B. 
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Proof: Suppose u and v are neighbors 'm. H. If rt G T then v £ T (as u reports so). If 
u £ B, then v £ B (as v reports that u £ T). □ 

Call a component of H truthful if it is a subset of T, else it is a subset of B and we 
call it corrupt. 

Let H' be the induced subgraph of G on the set T of all good vertices. 

Claim 2.2 Any connected eomponent of H' is also a connected eomponent of H. 

Proof: li u,v £ T are adjacent in G (and hence in H'), they are adjacent in H as well, by 
definition and by the fact that each of them reports honestly about his neighbors. Thus 
each component G' of H' is contained in a component (7 of LI. However, no u G T is 
adjacent in H to a vertex w £ B, implying that in fact G' = G and establishing the 
assertion of the claim. □ 

Claim 2.3 The graph H' contains a connected component of size at least \T\ — 5n > 
{1/2-6)n. 

Proof: Assume this is false and the largest connected component of H' is on a set of 
vertices Ui of size smaller than |T| — 6n. Since the total number of vertices of H' is 
|r| > n/2, it is easy to check that one can split the connected components of H' into two 
disjoint sets, each of total size at least Sn. However, the bigger among the two is of size 
bigger than n/4, and hence, since G is a (5-good expander, there is an edge of G between 
the two groups. This is impossible, as it means that there is an edge of G between two 
distinct connected components of H'. □ 

The analysis so far allow us to prove the easy part of the theorem. 

Claim 2.4 If\T\ > {l/2+5)n then there exists a linear time algorithm which finds T' C T 
and B' C B such that \T' U B'\ > (1 — 5)n. 

Proof: Note that if |T| > (1/2 -|- 5)n then Claim 2.3 implies that H must contain a 
connected component of size bigger than n/2, which must be truthful. Thus, if this is 
the case, more than n/2 of the truthful vertices of G can be identified by the simple, 
linear time algorithm that computes the connected components of H. Moreover, since all 
vertices but at most 6n are among their neighbors, this enables us to identify the types of 
all vertices besides less than 5n. □ 

It remains to show that even if we only assume that |T| > n/2 then we can still 
identify correctly most of the truthful vertices. We proceed with the proof of this stronger 
statement. 
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By Claims 2.2 and 2.3 it follows that H contains at least one connected component 
of size at least (1/2 — 5)n > 3/8n. If H contains only one such component, then this 
component must consist of truthful agents, and we can identify all of them. Otherwise, 
there is another connected component of size at least {1/2 —5)n, and as there is no room for 
more than two such components, there are exactly two of them, say Vi and V 2 . Note that 
by the properties of G there are edges of G between Vi and V 2 , and hence it is impossible 
that both of them are truthful components. As one of them must be truthful, it follows 
that exactly one of Vi and V 2 is a truthful component and the other corrupt. We next 
show that we can identify the types of both components. 

Construct an auxiliary weighted graph S on the set of vertices 1, 2,..., s representing 
the connected components Vi, V 2 • • •, 14 as follows. The weight Wi of i is defined by Wi = 
■j^. Two vertices i and j are connected iff there is at least one edge of G that connects 
a vertex in I 4 with one in Vj. Call an independent set in the graph S large if its total 
weight is bigger than 1/2. Note that by the discussion above T must be a union of the 
form T = Uig/I4, where / is a large independent set in the graph S. In order to complete 
the argument we prove the following. 

Claim 2.5 Either there is no large independent set in S containing 1, or there is no large 
independent set in S containing 2. 

Proof: Assume this is false, and let Ii be a large independent set in S containing 1, 
and I 2 a large independent set in S containing 2. To get a contradiction we show that 
for rc(/i) = w{l 2 ) = J 2 i<=i 2 have w{Ii) + w{l 2 ) < 1 (and hence it is 

impossible that each of them has total weight bigger than a half). 

To prove the above note, first, that the two vertices 1 and 2 of S' are connected (as 
each corresponds to a set of more than (1/2 — 5)n vertices of G, hence there are edges of 
G connecting Vi and V 2 )- Therefore Ii must contain 1 but not 2, and I 2 contains 2 but 
not 1. 

If there are any vertices i of 5 connected in S both to 1 and to 2, then these vertices 
belong to neither Ii nor I 2 , as these are independent sets. Similarly, if a vertex i is 
connected to 1 but not to 2, then it can belong to I 2 but not to Ii, and the symmetric 
statement holds for vertices connected to 2 but not to 1. So far we have discussed only 
vertices that can belong to at most one of the two independent sets Ii and l 2 - If this 
is the case for all the vertices of S, then each of them contributes its weight only to one 
of the two sets and their total weight would thus be at most 1, implying that it cannot 
be that the weight of each of them is bigger than 1/2, and completing the proof of the 
claim. It thus remains to deal with the vertices of S that belong to both Ii and l 2 - Let 



J C {3,4,... , s} be the set of all these vertices. Note, first, that the total weight of the 
vertices in J is at most 25, as the total weight of 1 and 2 is at least 2(1/2 — 5) = 1 — 25. 
Note also that by the discussion above each j G J is not a neighbor of 1 or of 2. By the 
assumption about the expander G the total weight of the vertices that are neighbors of 
vertices in J and do not belong to J is bigger than the total weight of the vertices in J. 
Indeed, this is the case as the number of neighbors in G of the set VJj^jVj that do not lie 
in this set is bigger than the size of the set. We thus conclude that if J' = Ns{J) — J is 
the set of neighbors of J that do not belong to J, then the total weight of the vertices in 
J' exceeds the total weight of the vertices in J, and the vertices in J' belong to neither Ii 
nor 12- We have thus proved that the sum of weights of the two independent sets Ii and 
I 2 satisfies 

w{Il) + w{l 2 ) < 2w{J) + (1 — 'w{J) — w{j')) < w{J) + w{j') + (1 — w{J) — w{j')) = 1 

contradicting the fact that both Ii and I 2 are large. This completes the proof of the claim. 
□ 

By Claim 2.5 we conclude that one can identify the types of the components Vi and 
V 2 . This means that we can identify at least (1/2 — 5)n truthful vertices with no error. 
Recall that this is the case also when H has only one connected component of size at least 
(1/2 — 5)n. Having these truthful vertices, we also know the types of all their neighbors. 
By the assumption on G this gives the types of all vertices but less than 5n, completing 
the proof of the main part of the theorem. 

It was easy to establish that the algorithm is linear provided |r| > (1/2 + 5)n is clear. 
However, if we only assume that |T| > \B\ the proof provides only a non-efficient algorithm 
for deciding the types of the components Vi and V 2 - Indeed, we have to compute the 
maximum weight of an independent set containing 1 in the weighted graph S, and the 
maximum weight of an independent set containing 2. By the proof above, exactly one of 
this maxima is smaller than 1 / 2 , providing the required types. □ 

We next show that the non-efficiency of the algorithm is necessary. 

Proof of Theorem 1.2: The proof is based on the following fact [5]; there exist constants 
b < a < lf2 such that deciding if a graph H on m vertices, all of whose degrees are bounded 
by 4, has a maximum independent set of size at least (a -|- b)m or at most (a — h)m is 
NP-hard. 

Let G" be a (5 good bounded degree expander on a set H of n vertices. Split the 
vertices into 3 disjoint sets Vi,V 2 ,V-i, where V 3 is an independent set in G' of size m, 
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where bm = rjn, all its neighbors are in V 2 , |Vi| = n/2 — am and IV 2 I = n/2 — m + am. 
Add on V 3 a bounded degree graph H as above, in which it is hard to decide if the 
maximum independent set is of size at least {a + b)m or at most {a — b)m. That is, identify 
the set of vertices of H with V 3 and add edges between the vertices of V 3 as in H. Call the 
resulting graph G and note that it is a (5-good expander (as so is its spanning subgraph 

GO- 

The reports of the vertices are as follows. Each vertex in Vi reports true on each 
neighbor it has in Vi, and corrupt on any other neighbor. Similarly, each vertex of V 2 
reports true on any neighbor it has in V 2 and corrupt on any other neighbor, and each 
vertex in V 3 reports corrupt on all its neighbors. Note that with these reports the connected 
components of the graph H in the proof of Theorem 1.1 are Vi, V 2 and every singleton in 
V^3. 

It is easy to check that here if H has an independent set I of size at least {a + b)m, then 
G has a set T of truthful vertices of size at least nl2-\- bm, namely, the set / U Vi, which is 
consistent with all reports. If H has no independent set of size bigger than (a — b)m, then 
G does not admit any set T of truthful vertices of size bigger than n/2 — bm consistent 
with all reports. This completes the proof. □ 

Proof of Theorem 1.3: Let B' be a set of at most en vertices of G whose removal 
splits G to connected components with vertex classes Vi,V 2 , ■ ■ ■ ,Vs, each of size at most 
en. Consider the following s possible scenarios Ri, for 1 < i < s. 

Ri'. the set of corrupt vertices is i? = B' U Vi, all the others are good vertices. The 
vertices in B' report that all their neighbors are corrupt. The vertices in Vi report that 
their neighbors in Vi are in T, and that all their other neighbors are in B. (The truthful 
vertices, of course, report truthfully about all their neighbors). 

It is not difficult to check that in all these s scenarios, all vertices make exactly the same 
reports. On the other hand, there is no vertex of G that is truthful in all these scenarios, 
hence no deterministic algorithm can identify a truthful vertex with no error. Since the 
number of corrupt vertices in all scenarios is at most 2en, the first assertion of the theorem 
follows. The claim regarding planar graphs and graphs with excluded minors follows from 
the results in [12], [3]. □ 

3 Directed Graphs 

Here we provide the proofs for the case of directed graphs. 
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Proof of Lemma 1.4: The graphs constructed are orientations of undirected expanders. 
Here are the details. Let G = iV, E) be a d-regular undirected non-bipartite Ramanujan 
graph, where d = 0(1/5) (see [11]). This is a Cayley graph with d/2 generators, and its 
girth is bigger than ||^|^. Let Ei denote all edges corresponding to, say, k = [3-\/d] of 
the generators and their inverses. Thus {V,E 2 ) is a 2/c-regular graph, take an arbitrary 
Eulerian orientation of it (an orientation where each vertex has in-degree and out-degree 
k). Orient the rest of the edges randomly, that is, for each edge e G E — Ei choose, 
randomly, independently and uniformly, one of the two possible orientations. As shown 
in [ 2 ] the average degree in the induced subgraph of G on any set of jn vertices does not 
exceed jd -I- 2\/d — 1 < sVd provided 7 < 1/Vd. In particular, if 7 < 85 and d < 1 / 7 ^ 
(which holds in our case, as d = 0(1/5)), the above inequality holds. Now if U is any set 
of at most 7 n /2 vertices, and U' = N~^{U) — U satisfies |17'| < \U\, then the set U U U' 
is of size at most yn but contains at least k\U\ > k{\U\ + |17'|)/2 edges; namely all the 
edges of Ei emanating from some vertex of U. This means that the average degree in the 
induced subgraph on U U U' is at least k > sVd, which is impossible. This shows that 
our directed graph satisfies property (i) (independently of the orientation of the edges in 
E — El). To prove that (ii) holds with high probability note that for any fixed disjoint sets 
of vertices A and B of sizes |A| > 5n and \B\ > n/4, the expander mixing lemma (c.L, 
e.g., [4], Corollary 9.2.5) implies that there are more than 2n edges of E — E' connecting 
A and B, provided d is at least some c/5. The probability that all these edges are directed 
from A to B, or that all of them are directed from R to A is smaller than As the 

number of choices for the pair of sets A and B is much smaller than we conclude 

that our oriented graph satisfies (ii) as well with high probability, completing the proof of 
the lemma. □ 

Proof of Theorem 1.5: The proof resembles that of Theorem 1.1 but requires several 
additional ideas. 

Let H be the spanning subgraph of G in which an edge (tt, v) of G is an edge of H iff 
u reports that v G T. Let Vi, V 2 , • • •, h/ be the sets of vertices of the strongly connected 
components (SCCs, for short) of H. 

Claim 3.1 All the vertices of each Vi are of the same type, that is, for each 1 < i < s, 
either Vi CT or Vi C B. 

Proof: If n G T and v is an out neighbor of u in EI, then v G T (as u reports so). If 
V G B, and u is an in-neighbor of v in H, then u G B (as u reports that u GT). □ 

Call an SCC of H truthful if it is a subset of T, else it is a subset of B and we call it 
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corrupt. 

Let H' be the induced subgraph of G on the set T of all truthful vertices. 

Claim 3.2 Any SCC of H' is also an SCO of H. 

Proof: If u, u G T and {u, v) is an edge of G, then it is an edge of H too. Thus each SCC 
G' of H' is contained in an SCC G of H. This SCC is truthful, by Claim 3.1, and cannot 
contain any additional truthful vertices as otherwise these belong to G' as well. □ 

Claim 3.3 The graph H' contains an SCC of size at least \T\ — 25n > (1/2 — 26)n. 

Proof: Consider the component graph of H': this is the directed graph F whose vertices 
are all the SCCs of H', where there is a directed edge from C to C' iff there is some edge of 
H' from some vertex of C to some vertex of C'. It is easy and well known that this graph is 
a directed acyclic graph, and hence there is a topological order of it, that is, a numbering 
Cl, C 2 ,... ,Cr of the components so that all edges between different components are of 
the form (Cj, Cj) with i < j. Order the vertices of H' in a linear order according to this 
topological order, where the vertices of Ci come first (in an arbitrary order), those of C2 
afterwards, etc. Let Ui be the vertex in place i according to this order (1 < i < |T|). If 
the vertices usn and belong to the same SCC, then this component is of size at 

least |r| — 25n and we are done. Otherwise, the SCC containing u\t \/2 differs from either 
that containing usn or from that containing u\'j'\-&n+i- In the first case, the set A of all 
SCCs up to that containing usn is of size at least 5n, and the set B of all, SCCs starting 
from that containing u^t \/2 is of size at least |r|/2 > n/4, and there is no edge directed 
from B to A, contradicting the property of G. The second case leads to a symmetric 
contradiction, establishing the claim. □ 

Note that the above shows that if |T| > (1/2 + 2S)n then H' and hence also H must 
contain an SCC of size bigger than n/2, which must be truthful. Thus, if this is the case, 
more than n/2 of the truthful vertices of G can be identified by the known linear time 
algorithm that computes the strongly connected components of H ([19], see also [17]). In 
addition, since all vertices but less than 5n are among their out-neighbors, this enables us 
to identify the types of all vertices besides less than Sn). 

We next show that even if we only assume that jT] > n/2 we can still identify correctly 
most of the truthful vertices. 

By the last two claims it follows that H contains at least one SCC of size at least 
(1/2 — 25)n > 3/8n. If H contains only one such component, then this component must 
consist of truthful agents, and we can identify all of them (and hence also the types of all 
their out-neighbors). Otherwise, there is another SCC of size at least (1/2 — 5)n, and as 
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there is no room for more than two such components, there are exactly two of them, say 
Vi and V 2 . Note that by the properties of G there are edges of G from Vi to V 2 and from 
V 2 to Vi, and hence it is impossible that both of them are truthful components. As one of 
them must be truthful, it follows that exactly one of them is truthful and one is corrupt. 
We next show that we can identify the types of both components. 

Recall that we have the SCCs of H, and the set T of all truthful vertices must be a 
union of a subset of these SCCs. In addition, this set must be of size bigger than n/2 and 
must be consistent with all reports of the vertices along every edge (in the sense that for 
any edge (rt, v) with u gT, the report of w on u should be consistent with the actual type 
of V.) 

Claim 3.4 Given the strongly eonneeted eomponents Vi, V 2 ) • • • > 14 of H and the reports 
along each edge, either there is no union Ii of SCCs including Vi whose size exceeds nj^ so 
that T = Ii,B = V — Ii is consistent with all reports along the edges, or there is no union 
I 2 of SCCs ineluding V 2 whose size exceeds n/2 so that T = I 2 , B = V — I 2 is consistent 
with all reports along the edges. 

Proof: Assume this is false, and let Ii,l 2 be as above. By the above discussion we know 
that Ii contains Vi but not V 2 and I 2 contains V 2 but not 14. Note that if some SCC Vi is 
contained both in R and in I 2 and there is any directed edge (n, v) from I 4 to some other 
SCC Vj, then if the report along this edge is that v is truthful, then Vj must be truthful 
component in both R and in I 2 . Similarly, if the report along this edge \s v G B, then 
Vj must be outside R and outside 12- In particular, there are no edges at all from Vi to 
14 or V 2 (as each of them lies in exactly one of the two unions Ii, 12 ). Let J be the set 
of all SCCs that are contained in both /i,/ 2 . By the remark above, for every edge {u,v) 
from a vertex of J to a vertex outside J, the report along the edge must he v G B (since 
otherwise v would also be in an SCC which is truthful both in R and in I 2 and hence 
would be in J). Thus all edges (tt, v) as above report v G B, implying that all components 
outside J to which there are directed edges from vertices in J belong to neither R nor I 2 . 
By the properties of our graph the total size of these components exceeds that of J, (as 
|J| < 4(5n and all out-neighbors of J are outside 14,14), and this shows that the sum of 
the sizes of R and I 2 is at most 

2|J| + (|^|-|J|-|iV+(J)-J|)<|R|. 

Therefore it cannot be that both R and I 2 are of size bigger than n/2, proving the claim. 

□ 
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By the last claim it follows that one can identify the types of the SCCs Vi and V 2 . This 
means that we can identify at least (1/2 — 25)n truthful vertices with no error. Recall that 
this is the case also when H has only one SCC of size at least (1/2 — 6)n. Having these 
truthful vertices, we also know the types of all their out-neighbors. By the assumption on 
G this gives the types of all vertices but less than 5n, completing the proof of the main 
part of the theorem. 

The comment about the linear algorithm provided |T| > (1/2 -|- 26)n is clear. If we only 
assume that T| > \B\ the proof provides only a non-efficient algorithm for deciding the 
types of the SCCs Vi and V 2 . Indeed, we have to check all 2^ possibilities of the types of 
each of the SCCs and see which ones are consistent with all reports and are of total size 
bigger than n/2. By the proof above, only one of the two SCCs Vi, V 2 will appear among 
the truthful SCCs of such a possibility. □ 

4 Discussion and Open Problems 

Our results show that for sufficiently strong expanders it is possible to find most of the 
truthful and most of the corrupt nodes even if there is only one more truthful than un¬ 
truthful nodes. In particular, this is possible for some very sparse, bounded degree graphs. 
This is in sharp contrast to the known results about the PMC model, that show that if we 
want to identify all corrupt vertices when their number is linear in the number of vertices, 
we need dense graphs with a quadratic number of edges. 

We have also seen that for graphs with bad expansion properties, like a grid or any 
planar graph, it is impossible to identify even a single truthful node even when there is 
a very high percentage of truthful nodes. It is interesting to study in more detail the 
relation between expansion and corruption. 

Question 4.1 Provide sharp criteria in terms of expansion and the fractional size of the 
set T for enabling corruption detection. 

To illustrate an example of such a result, consider the following argument. We say 
that an undirected graph G is S-connected if for every two disjoint sets Ai,A 2 with |yli| > 
5n, 1^21 > (1 — 35)n there is at least one edge between Ai and A 2 . Note that the notion of 
connectedness is much weaker than expansion. In particular a graph G can be 6 connected, 
yet at the same time have Sn/2 isolated vertices. 

Claim 4.2 Suppose that \T\ = (1 — e)n and the graph G is e-connected then it is possible 
to identify T' C T of size at least (1 — 2e)n. 
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Proof: Let E' G E he the set of edges both of whose end-points declare each other 
truthful. Recall that each connected components of G' = {V, E') is either truthful or 
corrupt. 

Let Ti, T 2 ,... denote all the components of size at least en in G'. Then we claim that 
if T' = UTj then \T \ T'\ < en. Assume otherwise. Since all the connected components of 
T\T' are of size at most en, there exists T" C T \ T' of size in [en, 2en] with no edges to 
T \ T" whose size is in [(1 — 3e)n, (1 — 2€)n]. This is a contradiction to e-connectedness 
and the proof follows. □ 

To see that the conditions of Claim 4.2 are tight up to constant factors consider star 
graph with m leaves. Assume that |T| < m — 1. Then it is easy to see that one cannot 
find even one member of T if all vertices declare all their neighbors corrupt. On the other 
hand, this example is (vacuously) l/(4m) connected. To get a non-trivial example, one 
can replace each node with a complete graph and each edge with a complete bipartite 
graph for an arbitrary k. 

We conclude with a short discussion of a variant of the model. From the modeling 
perspective, it is interesting to consider probabilistic aspects of the corruption detection 
problem. 

Question 4.3 What is the effect of relaxing the assumption that truthful nodes always 
report the status correctly? Suppose for example that each truthful node reports the status 
of each of its neighbors independently accurately with probability 1 — e. Note that in this 
case it is impossible to detect the status of an individual node with probability one. However 
it is still desirable to find sets T' and B' such that the symmetric difference TAT' and 
BAB' are small with high probability. Under what conditions can this be achieved? What 
are good algorithms for finding T' and B' ? 
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A and The Machine Testing Machine Puzzle 

A byproduct of the line of research initiated by [15] is the following beautiful puzzle. 
We have not been able to locate the exact source of the puzzle. Consider a factory that 
produces machines. The machines are used to test other machines. We call a machine 
truthful if it functions properly and corrupt otherwise. Given a batch of a 100 machines, 
exactly 51 of which are truthful: 

• Can you find all of the 51 truthful machines in the batch? 

• How can this be achieved using the minimal number of tests? 

It is not hard to see that the answer to the first item is yes. The second part of the puzzle 
is a bit more challenging (and is left to entertain those readers who have not seen the 
puzzle before). Note that 

• The machine problem above is a special case of the corruption detection problem 
on the complete graph on 100 vertices with exactly 51 truthful agents. However, in 
this problem we allow adaptive algorithms, that is, algorithms that select each test 
(among all edges of the complete graph) based on the results of the previous tests, 
whereas in our corruption detection problem we consider nonadaptive ones. 

• It is clear that if the number of corrupt machines is at least 50, then it is impossible 
to detect even one truthful machine. For example, in the case where there are 50 
machines of each type we may consider the following strategy of corrupt machines. 
A corrupt machine will report a corrupt machine truthful and truthful machine 
corrupt. It is clear that in this case, there is no way to distinguish between the 
corrupt and truthful machines. 
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